policy invariance
- Asia > Macao (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Hong Kong (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Asia > Macao (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Hong Kong (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
2bba9f4124283edd644799e0cecd45ca-AuthorFeedback.pdf
We thank all the reviewers for their constructive feedback. We address the key questions and concerns below. This is shown in Eq. 1 below. Therefore, this is not a valid counterexample to ρ -projection's handling of other forms of policy invariance. The ESOR values in Table 1 shows the number of iterations taken to reach expert's ESOR. However, they differ in the type of query used.
Useful Policy Invariant Shaping from Arbitrary Advice
Behboudian, Paniz, Satsangi, Yash, Taylor, Matthew E., Harutyunyan, Anna, Bowling, Michael
Reinforcement learning is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can be successfully used to shape the reward; by adding additional reward information, the agent can learn with much less data. Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered. While such potential-based reward shaping (PBRS) holds promise, it is limited by the need for a well-defined potential function. Ideally, we would like to be able to take arbitrary advice from a human or other agent and improve performance without affecting the optimal policy. The recently introduced dynamic potential based advice (DPBA) method tackles this challenge by admitting arbitrary advice from a human or other agent and improves performance without affecting the optimal policy. The main contribution of this paper is to expose, theoretically and empirically, a flaw in DPBA. Alternatively, to achieve the ideal goals, we present a simple method called policy invariant explicit shaping (PIES) and show theoretically and empirically that PIES succeeds where DPBA fails.
Expressing Arbitrary Reward Functions as Potential-Based Advice
Harutyunyan, Anna (Vrije Universiteit Brussel) | Devlin, Sam (University of York) | Vrancx, Peter (Vrije Universiteit Brussel) | Nowe, Ann (Vrije Universiteit Brussel)
Effectively incorporating external advice is an important problem in reinforcement learning, especially as it moves into the real world. Potential-based reward shaping is a way to provide the agent with a specific form of additional reward, with the guarantee of policy invariance. In this work we give a novel way to incorporate an arbitrary reward function with the same guarantee, by implicitly translating it into the specific form of dynamic advice potentials, which are maintained as an auxiliary value function learnt at the same time. We show that advice provided in this way captures the input reward function in expectation, and demonstrate its efficacy empirically.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Asia > Middle East > Jordan (0.04)
Policy Invariance under Reward Transformations for General-Sum Stochastic Games
Lu, X., Schwartz, H. M., Givigi, S. N.
We extend the potential-based shaping method from Markov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- Leisure & Entertainment (0.47)
- Government > Regional Government > North America Government (0.31)